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SYSTEM AND METHOD FOR ADAPTIVELY DESKEWING 
PARALLEL DATA SIGNALS RELATIVE TO A CLOCK 




Technical Field of the Invention 
The present invention relates generally to signaling between electrical 
10 components and in particular to a system and method for adaptively deskewing parallel 

data signals relative to a clock. 

Background of the Invention 
In the multiprocessor computer systems environment, clock pulses from a 

15 common source are distributed for controlling many widely separated circuit modules. 

Time delays associated with the passage of clock and data signals through parallel, but 
not identical, paths are not uniform; signals can arrive at their destination in skewed 
time relation to each other. Source synchronous clocking is often utilized whereby 
parallel data signals and a synchronous clock are distributed to widely separated circuit 

20 modules. The forwarded clock acts as a capture clock for data at the destination. The 

capture clock edge is optimally positioned between successive data edges so the 
receiving capturing device has equal setup and hold time margins. Often, finite time 
delay is added to each signal to correct for skew and to optimally position the forwarded 
capture clock edge relative to the deskewed data edges. 

25 It is possible to limit a certain amount of signal skew by applying carefiil 

attention to layout and design. Examples of methods to reduce clock pulse skew are 
shown in U.S. Pat. Nos. 4,514,749 by Skoji and 4,926,066 by Maimi et al. Such 
methods fail, however, to correct for skew from various divergent clock pulse path 
interconnections. In addition, such skew compensations, once implemented, cannot 

30 accommodate variations in skew caused by such factors as component aging, operating 

environment variations, and so forth. 
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Within a computer system, data is passed from register to register, with varying 
amounts of processing performed between registers. Registers store data present at their 
inputs either at a system clock transition or during a phase of the system clock. Skew in 
the system clock signal impacts register-to-register transfers, i.e., skew may cause a 
5 register to store data either before it has become valid or after it is no longer valid. 

As system clock periods shrink there is increasing pressure on the computer 
architect to increase the amount of determinism in the system design. Clock skew, like 
setup time, hold time and propagation delay, increase the amount of time that data is in 
an indeterminable state. System designers must be careful that this indeterminable state 
10 does not fall within the sampling window of a register in order to preserve data 

integrity. 

For the reasons stated above, and for other reasons stated below which will 
become apparent to those skilled in the art upon reading and understanding the present 
specification, there is a need in the art for a system and method for reducing skew 
1 5 between parallel signals within electrical systems. 

Summary of the Invention 
The above mentioned problems involved in the parallel transfer of data at high 
speeds are addressed by the present invention and will be understood by reading and 
20 studying the following specification. 

According to one aspect of the present invention,a system and method of 
reducing skew between a plurality of signals transmitted with a transmit clock is 
described. Skew is detected between the received transmit clock and each of received 
data signals. Delay is added to the clock or to one or more of the plurality of data 
25 signals to compensate for the detected skew. Each of the plurality of delayed signals is 

compared to a reference signal to detect changes in the skew. The delay added to each 
of the plurality of delayed signals is updated to adapt to changes in the detected skew. 

According to another aspect of the present invention, a circuit is described for 
reducing skew between a plurality of signals transmitted with a channel clock, wherein 
30 the plurality of signals includes a first and a second signal. The circuit includes first and 
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second data capture circuits, a delay line controller and a channel clock interface. The 
first data capture circuit is connected to the first signal and includes a first delay line and 
a first skew detection circuit connected to the first delay line. The second data capture 
circuit is connected to the second signal and includes a second delay line and a second 
5 skew detection circuit connected to the second delay line. The delay line controller is 

connected to the first and second delay lines and to the first and second skew detection 
circuits. The delay line controller receives skew indicator signals representing skew 
fi"om each of said first and second skew detection circuits and controls delay added by 
said first and second delay lines. The channel clock interface is connected to the first 

10 and second skew detection circuits. The channel clock interface firequency doubles the 

channel clock to form a doubled channel clock, which is then used by the first and 
second skew detection circuits to detect skew. 

According to yet another aspect of the present invention, a skew detection circuit 
includes a phase comparator, wherein the phase comparator compares phase of an input 

15 signal to a clock signal to generate a clock early signal and a data early signal. 

According to yet another aspect of the present invention, a delay line controller 
for controlling a plurality of delay lines, wherein each delay line receives an input signal 
and generates a delayed signal as a fimction of the input signal, includes a plurality of 
skew indicator signal inputs, wherein each skew indicator signal input is capable of 

20 receiving a skew indicator signal reflecting skew between one of the delayed input 

signals and a reference signal, a digital filter connected to each of said plurality of skew 
indicator signal inputs, wherein each digital filter generates a delay control signal 
associated with one of the delayed input signals and control logic for controlling the 
plurality of delay lines as a fimction of the delay control signals. 

25 According to yet another aspect of the present invention, a conmumication 

system includes a transmitter and a receiver. The transmitter transmits a plurality of 
signals in parallel, wherein the plurality of signals includes a first and a second signal. 
The receiver includes a deskewing circuit, wherein the deskewing circuit includes a first 
data capture circuit connected to the first signal, wherein the first data capture circuit 

30 includes a first delay line and a first skew detection circuit connected to the first delay 
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line, a second data capture circuit connected to the second signal, wherein the second 
data capture circuit includes a second delay line and a second skew detection circuit 
connected to the second delay line, a delay line controller connected to the first and 
second delay lines and the first and second skew detection circuits, wherein the delay 
5 line controller receives skew indicator signals representing skew from each of said first 

and second skew detection circuits and controls delay added by said first and second 
delay lines, and a channel clock interface connected to the first and second skew 
detection circuits, wherein said channel clock interface frequency doubles the channel 
clock to form a doubled channel clock. The first and second skew detection circuits 

10 detect skew as a function of the doubled channel clock. 

According to yet another aspect of the present invention, a communication 
interface for use on an integrated circuit includes a transmitter and a receiver. The 
transmitter transmits a plurality of signals in parallel, wherein the plurality of signals 
includes a first and a second signal. The receiver includes a deskewing circuit, wherein 

15 the deskewing circuit includes a first data capture circuit connected to the first signal, 

wherein the first data capture circuit includes a first delay line and a first skew detection 
circuit connected to the first delay line, a second data capture circuit connected to the 
second signal, wherein the second data capture circuit includes a second delay line and a 
second skew detection circuit connected to the second delay line, a delay line controller 

20 connected to the first and second delay lines and the first and second skew detection 

circuits, wherein the delay line controller receives skew indicator signals representing 
skew from each of said first and second skew detection circuits and controls delay 
added by said first and second delay lines, and a channel clock interface connected to 
the first and second skew detection circuits, wherein said channel clock interface 

25 frequency doubles the chaimel clock to form a doubled channel clock. The first and 

second skew detection circuits detect skew as a function of the doubled channel clock. 

According to yet another aspect of the present invention, a system and method 
for establishing the phase relationship between a plurality of signals, including a first 
signal, is described. Signal deskewing circuitry is initialized, wherein initializing 

30 includes driving the circuitry with a predefined sequence of data edges. A phase 
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comparator is driven with a clock having 2 edges per data bit and a 50% duty cycle, 
wherein driving includes sensing the clock and determining an error signal indicating 
drift from the 50% duty cycle. A phase relationship is determined between the first 
signal and the clock and delay for the first signal is set as a function of the phase 
5 relationship. The delay is then modified as a function of changes in the phase 

relationship. 

According to yet another aspect of the present invention, in a system having a 
channel clock, a plurality of channel signals and delay lines for delaying the channel 
clock and the plurality of channel signals, a system and method is described for 

10 adaptively deskewing delays between the plurality of channel signals. Phase 

relationships are determined between the channel clock and the plurality of channel 
signals, wherein determining includes generating skew indicator signals for each of the 
plurality of signals. Each skew indicator signal is filtered to reduce jitter and a Data 
Minus Clock (DMC) register is initialized for each channel signal. The value of the 

1 5 DMC register corresponding to a particular signal is increased if that signal has arrived 

early with respect to a reference signal; the value of the DMC register corresponding to 
a particular signal is decreased if that signal has arrived late with respect to the reference 
signal. A minimum DMC value from the plurality of DMC registers is determined and, 
if the minimum DMC value is greater than zero, the number of delay increments of the 

20 clock delay line is set to a minimum. If the minimum DMC value is less than zero, 

delay of the clock delay line is set to the absolute value of the minimum DMC value. A 
channel signal delay is calculated for each of the channel signals, wherein calculating a 
delay includes determining a difference between the minimum DMC value and the 
DMC value corresponding to each of the plurality of channel signals. Each of the 

25 channel signal delay lines is set to delay its channel signal by the channel signal delay 

calculated for each of the channel signals. 

Brief Description of the Drawings 
In the following drawings, where like numbers indicate similar function, 
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Figure 1 is a high-level block diagram of a signal deskewing circuit according to 
the present invention; 

Figure 2 shovi^s one embodiment of the signal deskewing circuit of Figure 1; 

Figure 3 illustrates one embodiment of the channel clock interface and delay line 
5 controller of Figure 2; 

Figure 4 shows another embodiment of the signal deskewing circuit of Figure 1; 

Figure 5 is a timing diagram showing the relationship between signals in a 
communications channel; 

Figure 6 provides an illustration of a skew incident; 
10 Figure 7 illustrates coarse correction according to the present invention; 

Figures 8a-c illustrate a phase comparator which can be used in deskewing 
circuits according to the present invention; 

Figure 9 illustrates a feedback control system algorithm used to control delay 
added to each of the signal and clock lines; 
15 Figure 10 illustrates a digital filter; and 

Figure 1 1 illustrates an electronic system using the signal deskewing circuit of 
the present invention. 

Dfitailed nescriptio n of the Invention 
20 In the following detailed description of the preferred embodiments, reference is 

made to the accompanying drawings which form a part hereof, and in which is shown 
by way of illustration specific preferred embodiments in which the inventions may be 
practiced. These embodiments are described in sufficient detail to enable those skilled 
in the art to practice the invention, and it is to be understood that other embodiments 
25 may be utilized and that logical, mechanical and electrical changes may be made 

without departing from the spirit and scope of the present invention. The following 
detailed description is, therefore, not to be taken in a limiting sense, and the scope of the 
present invention is defined only by the claims. 
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The system and method described below can be used to reduce skew between 
parallel data signals relative to a clock. In one embodiment, skew is reduced relative to 
an optimally positioned (orthogonal) capture clock edge as is described below. 
Figure 1 is a high-level block diagram of a signal deskewing circuit 100 
5 according to the present invention. As shown in Figure 1, signal deskewing circuit 100 

receives two or more data signals 105 and a channel clock 1 15 from another device and 
removes skew between the two or more data signals to create deskewed data signals 
116. In one embodiment, signal deskewing circuit 100 includes two or more data 
capture circuits 1 10, a delay line controller 120 and a channel clock interface 130. Each 

10 data capture circuit 110 includes a delay line 1 12 and a skew detection circuit 1 14 

connected to delay line 1 12. Delay line controller 120 is connected to each delay line 
1 12 and each skew detection circuit 1 14. Delay line controller 120 receives skew 
indicator signals 118 representing skew from each of the skew detection circuits 1 14 
and controls the delay added by each of the delay lines 1 12 via control 122. Channel 

15 clock interface 130 receives channel clock 115, doubles its frequency to form doubled 

channel clock 132 and drives each skew detection circuit 1 14 with doubled channel 
clock 132. 

One embodiment of deskewing circuit 100 is shown in more detail in Figure 2. 
In the embodiment shown in Figure 2, data capture circuit 110 includes delay line 1 12, 

20 skew detection circuit 1 14 and synchronizer circuit 140. Synchronizer circuit 140 is 

used to synchronize data received on data signals 105 to a core clock 150. In one 
embodiment, synchronizer circuit 140 includes a serial to parallel converter 142, a 
sampler 144 and an output register 146. Serial to parallel converter 142 and sampler 
144 are clocked with doubled channel clock 132. Output register 146 is clocked with 

25 core clock 150. In one such embodiment, serial to parallel converter 142 is a four bit 

shift register. 

In another embodiment (not shown), synchronizer circuit 140 includes a sampler 
144 and an output register 146. Sampler 144 is clocked with doubled channel clock 
132. Output register 146 is clocked with core clock 150. 
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In one embodiment, such as is shown in Figure 2, delay hne controller 120 is 
clocked by core clock 150. In one such embodiment, delay line controller 120 outputs a 
sample signal 152 used to drive each skew detection circuit 114 in a method that will be 
described below. 

5 In one embodiment, channel clock interface 130 includes a delay line to allow 

for additional clock delay. In one such embodiment, delay line controller 120 
processes skew indicator signals 118 to minimize the skew between data bits and to 
optimally delay the doubled channel clock with respect to a predetermined timing 
scheme. Delay line controller 120 determines the amount of delay a signal 105 requires 
10 and through one or more control lines 122 dictates the specific behavior of each delay 

line 112. 

In one embodiment, each delay line 112 sends a processed channel data signal 
108 to skew detection circuit 1 14. Skew detection circuit 1 14 compares the phase of the 
processed channel data signal 108 to the phase of the doubled channel clock 132 

15 supplied by channel clock interface 130. At the completion of this phase comparison 

skew detection circuit 1 14 generates a skew indicator signal 118 representing skew 
detected in each data channel. In one such embodiment, skew indicator signal 118 
includes a clock early signal which is active when the reference clock signal edge is 
early relative to the data edge and a data early signal which is active when the data edge 

20 is early relative to the reference clock signal edge. 

Delay line controller 120 receives the phase comparison information via skew 
indicator signal 118 and determines whether additional delay adjustments are required. 
Since any individual phase comparison would be subject to significant error due to data 
edge jitter, a large number of samples are required before an updated estimate of data 

25 "early" or "late" can be made. (In one embodiment, a minimum of 256 samples are 

required before an updated estimate of data "early" or "late" can be made.) 

In one embodiment, individual phase comparisons are digitally filtered inside 
delay line controller 120 prior to any delay adjustments being made to the clock or data 
signals. 
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In the embodiment shown in Figure 2, skew detection circuit 1 14 is driven by a 
signal 132 produced by channel clock interface 130. In one embodiment, channel clock 
interface 130 doubles the frequency of channel clock 115 and drives skew detection 
circuit 1 14 with the doubled channel clock 132. In one such embodiment, such as is 
5 shown in Fig. 3, channel clock interface 132 includes a fine tune delay line 160, a 

frequency doubler 162 and a fanout 164. Fine tune delay line delays channel clock 115 
tinder control of delay line controller 120. The resulting channel clock is frequency 
doubled using frequency doubler 162 and buffered with fanout 164. 

In one embodiment, a duty cycle sense circuit 166 is used to ensure that doubled 

10 channel clock 132 has approximately a 50 percent duty cycle. In one such embodiment, 

doubled channel clock 132 has a positive duty cycle of 45-55%. 

In one embodiment, serial to parallel converter 142 receives data from delay line 
112 and converts the data to a parallel format. The data is then shifted, in parallel, to 
sampling circuit 144. In one embodiment, sampling circuit 144 samples the parallel 

15 data read from serial to parallel converter 142 such that it can be latched by output 

register 146. Output register 146 drives deskewed data signal 116 with a deskewed data 
signal synchronized to core clock 150. 

Figure 4 provides a more detailed illustration of one embodiment of a signal 
deskewing circuit 100 according to the present invention. In the embodiment shown in 

20 Figure 4, delay line 1 12 includes a fine tune delay line 200 and a coarse tune delay line 

210. Skew detection circuit 114 includes a phase comparator 220 which receives a 
sample signal 152 from delay line controller 120 and generates a clock early signal 125 
and a data early signal 127. In one embodiment, coarse tune delay line 210 adds 
additional delay to the parallel data signals, as needed, in increments of the doubled 

25 channel clock period. 

In the embodiment shown in Figure 4, four bit shift register 230 receives data 
from coarse tune delay line 210 and generates four bit nibbles representative of groups 
of four bits receives on channel data 105. Sampler 144 includes an even sample register 
250 and an odd sample register 252. Each sample register is clocked with doubled 

30 channel clock 132. In the embodiment shown, each group of eight bits is split into an 
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even nibble and an odd nibble. Even nibbles are stored in even sample register 250. 
Odd nibbles are stored in odd sample register 252. In the embodiment shown, output 
register 146 is a dual input register 260. Register 260 samples each of even sample 
register 250 and odd sample register 252 in a ping pong fashion on alternate cycles of 
5 core clock 150 to come up with a four bit data out 265 synchronized to core clock 150. 

In the embodiment shown in Figure 4, fine tune delay line 200 is controlled via 
control line 202. In one embodiment, control line 202 includes an enable bit for channel 
clock interface 130 and for each data capture circuit 110. In addition, control line 202 
includes a three bit shift_mode signal driven to each of the data capture circuits 110 and 

10 to clock interface 130. In one such embodiment, the three bit shift_mode signal and the 

enable bit are used to control mode selection registers within each of the fine tune delay 
lines 200, 160. In one embodiment, thermometer encoding is used within each of the 
fine tune delay lines to configure delay. A more detailed description of fine tune delay 
lines is provided in "A Programmable Differential Delay Circuit with Fine Tune 

1 5 Adjustment", filed herewith, which is hereby incorporated by reference. 

Figure 5 illustrates timing relationships between channel data, handshake, and 
clock at the transmit end and at the receive end. In the example shown, the shaded 
signals are firom the transmit side while the non-shaded signals are at the receive side. 
Arrows 280 and 290 represent the earliest and latest point in time, respectively, at which 

20 Data_Even and Data_Out can be sampled into the core of the device (given the premise 

that the output of the two stage synchronizer must be a logic 0 to accept the Even/Odd 
data). 

Figure 6 provides an illustration of a simplified timing diagram showing how a 
skew incident is found, according to the present invention. Doubled clock signal 132, 

25 Clk2x, drives phase comparator 220, within skew detection circuit 1 14, with three 

consecufive edges: an up 300, a down 310 and another up 320. The signal data will be 
sampled at each of the consecutive edges (300, 310, 320). For example, if on the first 
edge 300 of Clk2x high data is captured and on the third edge of Clk2x 320 3 low data 
is captured, one knows a transition on the data signal has occurred between those two 

30 clk2x edges. Note, in this particular embodiment of this invention, the clk2x signal 
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must be twice the frequency of the data signal and must be run with a 50% duty cycle. 
By placing the clock and the data in this relationship, the rising edge will occur in the 
middle of the data during a valid state, and the second clk2x edge will occur during a 
data transition, resulting in an uncertain sample, 330. That positioning will achieve 
5 optimal positioning of the clock. As a result, one knows that if high data is captured on 

the second edge of clk2x 310 then it is known the clock is early and a clock early signal 
125 is activated. On the other hand if low data is captured on the second edge of clk2x 
then data is early and a data early signal 127 is activated. In other words, one samples 
at three consecutive clock edges and if the first and third edges are different, then the 

10 data made a transition. By examining the data captured at the second edge one can 

determine whether the clock was early or if the data was early. This approach will 
optimally position the clock edge even if the setup and hold requirements of the 
capturing device are not identical. 

In one embodiment, bit deskew and clock centering circuitry is added to 

1 5 independently center the capture clock within the center of each data eye. In one such 

embodiment, deskew is achieved by adding additional delay to "early" arriving signals 
so that they match the "latest" arriving signal. 

In one embodiment, delay is added to the clock or data signals to position the 
channel clock within the data eye. Delay line controller 120 maintains minimum 

20 latency through the delay lines once this objective is met. 

In the embodiment shown in Fig. 4, delay lines 1 12 include a fine tune delay line 
200 and a coarse tune delay line 210. In one such embodiment, fine tune delay line 200 
provides a minimum of 1 .5 ns of fine tune deskew range in less than 90 ps step sizes. 
Other increments could be used to offer greater or lesser degrees of fine tuning. In 

25 addition, the number of fine tune stages could increase or decrease to provide more or 

less than the 1.5 ns of fine tune deskew range. 

In one embodiment, fine tune delay line 200 includes a number of differential 
delay circuits. In the embodiment described in the patent application entitled "A 
Programmable Differential Delay Circuit with Fine Tune Adjustment" discussed above, 

Attorney Docket 499.028US1 1 1 Client File No. 15-4-932.00 



an internal multiplexing scheme eases many timing and physical design concerns 
encountered when selecting between tap points distributed along a long delay line. 

In one embodiment, coarse tune delay line 210 provides a frequency dependent 
amount of additional delay (1, 2, or 3 clock cycles) which corresponds to a range of 2.5 
5 ns at signaling rates of 800 Mb/s. The coarse tuning technique uses the frame signal 

shown in Fig. 5 as a reference and can deskew ± one clock cycle of delay variation with 
respect to the signal. In a bidirectional signaling embodiment, two independent frame 
signals traveling in opposite directions are used. 

In one embodiment, channel clock 1 15 is nominally delayed from channel data 
10 1 05 by half of a bit duration. In one such embodiment, this delay takes place on the 

transmit side of the link either by launching channel clock 115 off of the opposite edge 
of the transmit clock than that used to launch channel data 105 or by launching clock 
3 115 and data 105 off of the same transmit clock and then delaying clock 115 with 

r'1 additional PCB foil trace length. 

1^15 In one embodiment, phase comparator 220 is a digital sample and hold phase 

comparator used to establish the phase relationship between double channel clock 132 
and fine tuned deskewed data 204. Since, as is noted above, any individual phase 

a 

H comparison would be subject to significant error due to data edge jitter, a minimum of 

f • ^ 

[] 256 samples are required before an updated estimate of data "early" or "late" can be 

20 made. 

Q In one embodiment, an initial training sequence is required to deskew and center 

the date and clock. To facilitate this, in one such embodiment, the channel protocol 
includes an initial start-up sequence. The initial start-up sequence provides a 
sufficiently long sequence of data edges to guarantee that delay line controller 120 can 

25 deskew the data using fine tune delay line 200. 

At the end of the start-up sequence, a one-time coarse tune sequence is initiated. 
The coarse time sequence is required because the phase comparator has phase ambiguity 
if channel clock 1 15 is skewed from data 105 by more than ± Tbit/2. In other words, 
phase comparator 220 cannot distinguish whether the Nth clock edge is being compared 

30 to the Nth data eye or the (N-l)th or (N+l)th data eyes. 
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To counter this, in one embodiment, the one-time coarse tuning sequence is used 
to re-aUgn all data bits which have slipped beyond the resolution of phase comparator 
220. In one embodiment, logic within the frame data bit slice is designed to detect a 
unique coarse tuning sequence (e.g., '1 1001 V) sent on the incoming frame signal. Upon 
5 detection, a CTUNE pulse is generated and fanned out to all the data bit slices, data 

ready and frame. The CTUNE pulse delays the incoming data by one, two or three 
doubled channel clocks 132 prior to entering the serial to parallel converter, after 
determining if the data is early, nominal or late with respect to the CTUNE pulse. An 
example of this correction is shown in Fig. 7. 

10 If none of the slices has late arriving data leading to cycle slip (determined, e.g., 

by a logical OR of all the data, data ready and frame 'late' signals), then, in one 
embodiment, all the data travels through one less coarse tune flip flop of delay to reduce 
the overall latency by one doubled channel clock cycle. 

In the embodiment discussed above, circuitry in coarse tuning delay circuit 210 

15 can be used to deskew all data bits as long as there is not more than one clock cycle slip 

in either direction between any individual data or data ready bit relative to the frame 
signal (the frame signal acts as a coarse tune reference point). This range can easily be 
increased to any arbitrary limit with additional circuitry. 

In one embodiment, each data, data-ready and frame signal is deskewed by a 

20 separate bit slice deskew circuit 110. Phase comparators 220 within each bit slice 

produce an output which indicates whether doubled channel clock 132 is early or late 
with respect to the optimal clock position. A simplified diagram of phase comparator 
220 is shown in Figs. 8a-c. Phase comparator 220 requires a 50% duty cycle clock with 
two edges per data bit. Double channel clock 132 provides such a clock. In one 

25 embodiment, phase comparator 220 includes flip-flops 440, 442 and 444. These flip- 

flops match flip-flops in data capture circuit 1 10 so that phase comparator 220 can 
properly position clock 132 in the data eye independent of the set up and hold 
requirements of the data capture flip flop. In one such embodiment, phase comparator 
220 also includes logic (not shown) to hold the first phase comparison that occurs after 
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the sample input signal goes active. Each sampling window is 16 bits wide. Therefore, 
consecutive comparisons should not be subject to cycle to next cycle correlations. 

In one embodiment, delay line controller 120 includes circuitry to adaptively 
deskew delays between all data, data ready and frame bits and to optimally position 
5 capture clock 132 between opening and closing edges of the data eye. The deskew 

circuitry continuously monitors phase comparators 220 inside all data bit slices and 
periodically adjusts the tap settings of data and clock fine tune delay lines (200, 160) to 
optimally position the sampling clock. Controller 120 maintains minimum latency 
through delay lines 200 and 160 to minimize jitter added by the delay lines themselves. 

10 An overview of a feedback control system which can be used to control the Data, 

Data_Ready, Frame, and Clock delay lines is shown in Fig. 9. 

As can be seen in Fig. 9, at reset, control moves to 400 and all 
Data_Minus_Clock (DMC) delay value registers are set to 0. In addition, the tap 
settings in each delay line 200, 160 are reset to add the minimum delay. Control then 

1 5 moves to 402, wherein the clock vs. data phase relationship is analyzed for each bit slice 

(data, data_ready and frame signal each have their own bit slice). If filtered "clock 
early" is detected from any given bit slice, control moves to 404 and the corresponding 
DMC register is decremented by one. Control then moves to 406. 

If, however, filtered "data early" is detected from any given bit slice, control 

20 moves to 408 and the corresponding DMC register is incremented by one. Control then 

moves to 406. 

At 406 a determination is made of the minimum DMC value across all the bit 
slices. If the minimum DMC value is greater than or equal to zero, control moves to 
410 and the clock delay is set to the minimum clock delay. Control then moves to 414. 
25 If, however, the minimum DMC value is less than zero, control moves to 412 

and the clock delay is set to the increment corresponding to the absolute value of the 
minimum DMC value. Control then moves to 414. 

At 414, each bit slice delay line 200 is set to delay its data signal by the 
difference between its DMC value and the minimum DMC value. Control then moves 
30 to 402 and the process begins again. 



Attorney Docket 499.028US1 



Client File No. 15-4-932.00 



Since, as is noted above, any given phase comparison is subject to data edge 
jitter (i.e., noise which may exceed ± 200 ps), many samples are observed before an 
estimate of the relative channel clock/channel data relationship is made. In one 
embodiment, such as is shown in Figs. 4 and 10, a digital filter 262 can be used in delay 
5 line controller 120 to compute an estimate of the data-clock phase relationship for each 

data slice by computing a running accumulation (with fading memory) of the individual 
"clock early" and "data early" comparisons for each data and data ready signal. In one 
embodiment, a separate digital fiher 262 is provided for each data and data ready signal. 
The filter of Fig. 10 implements the recursive relationship: = ACQ + ^/2*Yi^-i, 

10 where ACC^ is the accumulated sum of approximately the last 128 samples. Filtered 

outputs 460 and 462 go active only if overflows or underflows (this should require a 
minimum of approximately 256 samples firom when Yq = 0). The benefit of the digital 
filter is that the noise is being sampled a minimum of 256 times before a new phase 
estimate is made. Since the variance of the average of N samples of a random variable 

15 is l/sqrt(N) times as large as the variance of a single sample, filtering a large number of 

samples dramatically reduces the the error associated with data edge jitter. 

If there are not a sufficient number of data transitions, filter 262 will not allow 
the delay line to change state. In one embodiment, fine tune delay line 200 can update 
in as short of time as Tclk*1024 = 5 ns*1024 or 5.12 us. An individual update can 

20 cause the data delay to move relative to the clock delay by +/- one tap setting (45 ps/ 90 

ps increments best case (BC)/worst case (WC)). In order to deskew 1250 ps of skew, 
one tap setting at a time (BC) will require 150 us, assuming sufficient data transitions. 
This should be adequate for tracking delay variations due to environmental factors such 
as voltage and temperature. 

25 Figure 1 1 is a block diagram illustrating an electronic data processing system 

500 constructed to take advantage of the present invention. Electronic data processing 
system 500 includes two or more electronic devices 510, 530 (e.g., a processor unit 510 
connected to a memory device 530) connected by a communication interface 540 
having a signal deskewing circuit 520 as described and presented in detail above in 

30 connection with Figures 1-10 above. In one embodiment, interface 540 includes two or 
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more channel data lines and a separate channel clock line. In one such embodiment, 
system 500 is implemented on a single semiconductor wafer. In an alternative 
embodiment, device 510 and device 530 are implemented as two separate integrated 
circuits. 

5 In one embodiment, each of the devices 510, 530 include an integral 

communications interface; each communications interface includes a signal deskewing 
circuit 100 (not shown) as described and presented in detail above in connection with 
Figures 1-4 above. 

Conclusion 

Thus, novel structures and methods for reducing the skew on signals transmitted 
between electrical components while reducing both engineering and material costs 
related to achieving low skew occurrence in data signals has been described. 

When transferring parallel data across a data link, variations in data path delay 
or an imperfectly positioned capture clock edge limit the maximum rate at which data 
can be transferred. Consequently, a premium is spent in engineering design time and 
material cost to realize a low skew data links with proper clock-data phase relationship. 
In one particular area, electrical cables, some have been paying a very high premium for 
low skew properties. This invention should dramatically relax the low skew 
requirement of similar cables and consequently reduce costs as they become easier to 
manufacture allowing more than a single vendor to produce. One should expect to 
achieve faster communication rates with this invention and thus the achievement of a 
higher premiums on products that implement this invention. 

In one embodiment of the present invention, this invention compensates data 
path delays by adding additional delay to the early arriving signal xmtil they match the 
delay of the latest arriving signal. Furthermore, if the clock which is to capture this data 
is early or late with respect to a optimal quadrature placement (depending on latch 
setup/hold requirement) additional data or clock path delay is added to optimally 
position all data with respect to the capturing clock. 
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This can be strategically important because it affords a way to either 
dramatically cut costs or achieve higher performance in an area where many in the 
affected industries would not without equivalent functionality. Much of system cost is 
based on commodity parts (e.g. Microprocessors, Memory), which most industry 
5 participants pays an equal price for, so in areas where one uses unique parts (e.g. cables) 

it is a strong advantage to be able to find much less expensive solutions to the problem 
of variations in data path delay when transferring parallel data cross a data link, in order 
to command higher product margins. 

Although specific embodiments have been illustrated and described herein, it 
10 will be appreciated by those of ordinary skill in the art that any arrangement which is 

calculated to achieve the same purpose may be substituted for the specific embodiment 
shown. This application is intended to cover any adaptations or variations of the present 
invention. Therefore, it is manifestly intended that this invention be limited only by the 
claims and the equivalents thereof 
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